Standardised Evaluation of English Noun Compound Interpretation
نویسندگان
چکیده
We present a tagged corpus for English noun compound interpretation and describe the method used to generate them. In order to collect noun compounds, we extracted binary noun compounds (i.e. noun-noun pairs) by looking for sequences of two nouns in the POS tag data of the Wall Street Journal. We then manually filtered out all noun compounds which were incorrectly tagged or included proper nouns. This left us with a data set of 2,169 noun compounds, which we annotated using a set of 20 semantic relations defined by Barker and Szpakowicz (1998) allowing the annotators to assign multiple semantic relations if necessary. The initial agreement was 52.31%. The final data set contains 1,081 test noun compounds and 1,088 training noun compounds.
منابع مشابه
Standardised Evaluation of English Noun Compound Interpretation Su Nam Kim and Timothy Baldwin Interpreting Compound Nominalisations Jeremy Nicholson and Timothy Baldwin Paraphrasing Verbs for Noun Compound Interpretation
This paper describes a dataset which provides the platform for a task on the extraction of English verb particle constructions with basic valence information.
متن کاملNoun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study
The paper addresses an important challenge for the automatic processing of English written text: understanding noun compounds’ semantics. Following Downing (1977) [1], we define noun compounds as sequences of nouns acting as a single noun, e.g., bee honey, apple cake, stem cell, etc. In our view, they are best characterised by the set of all possible paraphrasing verbs that can connect the targ...
متن کاملParaphrasing Verbs for Noun Compound Interpretation
An important challenge for the automatic analysis of English written text is the abundance of noun compounds: sequences of nouns acting as a single noun. In our view, their semantics is best characterized by the set of all possible paraphrasing verbs, with associated weights, e.g., malaria mosquito is carry (23), spread (16), cause (12), transmit (9), etc. Using Amazon’s Mechanical Turk, we col...
متن کاملInterpreting noun compounds using paraphrases Interpretación de los compuestos nominales mediante paráfrasis
Noun compounds are abundant in English and their interpretation is crucial for many natural language processing tasks. We propose a method for automatic two-noun noun compound interpretation that searches for suitable paraphrases in static corpora and then issues Web search engine queries to validate them. Native speakers were recruited to evaluate the returned paraphrases for noun compounds: t...
متن کاملInterpreting Compound Noun Phrases Using Web Search Queries
A weakly-supervised method is applied to anonymized queries to extract lexical interpretations of compound noun phrases (e.g., “fortune 500 companies”). The interpretations explain the subsuming role (“listed in”) that modifiers (fortune 500) play relative to heads (companies) within the noun phrases. Experimental results over evaluation sets of noun phrases from multiple sources demonstrate th...
متن کامل